Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MON-2693: Scrape profiles #1785

Merged
merged 12 commits into from
Mar 3, 2023

Conversation

JoaoBraveCoding
Copy link
Contributor

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@JoaoBraveCoding JoaoBraveCoding marked this pull request as draft September 30, 2022 16:55
@JoaoBraveCoding JoaoBraveCoding changed the title Draft: MON-2693: Scrape profiles MON-2693: Scrape profiles Sep 30, 2022
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 30, 2022
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2022

p.Spec.ServiceMonitorSelector = labelSelector
p.Spec.PodMonitorSelector = labelSelector
p.Spec.ProbeSelector = labelSelector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to leave ProbeSelector untouched as it's always nil.

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMO would need to deploy all new service monitors so they can be picked up by Prometheus?

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2023
@simonpasquier
Copy link
Contributor

@JoaoBraveCoding it's ok to undraft it, isn't it?

@JoaoBraveCoding JoaoBraveCoding marked this pull request as ready for review January 25, 2023 08:57
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 25, 2023
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start! Should we first implement full & operational profiles only?

pkg/manifests/types.go Outdated Show resolved Hide resolved
jsonnet/components/control-plane.libsonnet Outdated Show resolved Hide resolved
pkg/manifests/manifests.go Outdated Show resolved Hide resolved
pkg/tasks/nodeexporter.go Outdated Show resolved Hide resolved
pkg/manifests/manifests.go Outdated Show resolved Hide resolved
pkg/manifests/types.go Outdated Show resolved Hide resolved
@JoaoBraveCoding
Copy link
Contributor Author

/retest-required

- action: labeldrop
regex: instance
- action: keep
regex: (kube_daemonset_status_current_number_scheduled|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_available|kube_daemonset_status_number_misscheduled|kube_daemonset_status_updated_number_scheduled|kube_deployment_metadata_generation|kube_deployment_spec_replicas|kube_deployment_status_observed_generation|kube_deployment_status_replicas_available|kube_deployment_status_replicas_updated|kube_horizontalpodautoscaler_spec_max_replicas|kube_horizontalpodautoscaler_spec_min_replicas|kube_horizontalpodautoscaler_status_current_replicas|kube_horizontalpodautoscaler_status_desired_replicas|kube_job_failed|kube_job_status_active|kube_job_status_start_time|kube_node_info|kube_node_labels|kube_node_role|kube_node_spec_taint|kube_node_spec_unschedulable|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_persistentvolume_info|kube_persistentvolume_status_phase|kube_persistentvolumeclaim_access_mode|kube_persistentvolumeclaim_info|kube_persistentvolumeclaim_labels|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_last_terminated_reason|kube_pod_container_status_restarts_total|kube_pod_container_status_waiting_reason|kube_pod_info|kube_pod_owner|kube_pod_status_phase|kube_pod_status_ready|kube_pod_status_unschedulable|kube_poddisruptionbudget_status_current_healthy|kube_poddisruptionbudget_status_desired_healthy|kube_poddisruptionbudget_status_expected_pods|kube_replicaset_owner|kube_replicationcontroller_owner|kube_resourcequota|kube_state_metrics_list_total|kube_state_metrics_watch_total|kube_statefulset_metadata_generation|kube_statefulset_replicas|kube_statefulset_status_current_revision|kube_statefulset_status_observed_generation|kube_statefulset_status_replicas|kube_statefulset_status_replicas_ready|kube_statefulset_status_replicas_updated|kube_statefulset_status_update_revision|kube_storageclass_info|process_start_time_seconds)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube_node_labels is kept, but no

kube_pod_labels
kube_namespace_labels
kube_poddisruptionbudget_labels
kube_persistentvolume_labels
kube_persistentvolumeclaim_labels

we have following bugs to keep the above metrics
https://bugzilla.redhat.com/show_bug.cgi?id=2011698
https://bugzilla.redhat.com/show_bug.cgi?id=2015386
https://bugzilla.redhat.com/show_bug.cgi?id=2018431

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these but I'll have to investigate why the tool I developed didn't pick up these, I'll open an issue on the project and add a task to the GA epic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay looking quickly at the bugs these seem to be metrics that are not used in our default alerting so it's normal that they got excluded from the list. The minimal profile is very restrictive and should only contain metrics that are essential to default alerts, default rules, console and telemetry
Note that kube_persistentvolumeclaim_labels it's already on the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also searched on the CMO repo for those metrics to double check and they are not used, so from my POV things are working correctly and those metrics should be excluded (except kube_persistentvolumeclaim_labels, which was already on the list). But do let me know if I missed something

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it makes sense

@juzhao
Copy link

juzhao commented Mar 1, 2023

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Mar 1, 2023
@JoaoBraveCoding
Copy link
Contributor Author

/retest-required
/skip

Documentation/api.md Outdated Show resolved Hide resolved
pkg/manifests/types.go Outdated Show resolved Hide resolved
pkg/manifests/types.go Outdated Show resolved Hide resolved
pkg/manifests/types.go Outdated Show resolved Hide resolved
pkg/manifests/types.go Outdated Show resolved Hide resolved
Signed-off-by: JoaoBraveCoding <jmarcal@redhat.com>
@JoaoBraveCoding
Copy link
Contributor Author

/retest-required

2 similar comments
@raptorsun
Copy link
Contributor

/retest-required

@JoaoBraveCoding
Copy link
Contributor Author

/retest-required

@JoaoBraveCoding
Copy link
Contributor Author

/test e2e-agnostic

@simonpasquier
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 2, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 2, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, JoaoBraveCoding, simonpasquier

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JoaoBraveCoding,jan--f,simonpasquier]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 0be07c1 and 2 for PR HEAD ad0a06d in total

@JoaoBraveCoding
Copy link
Contributor Author

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 8556401 and 1 for PR HEAD ad0a06d in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2023

@JoaoBraveCoding: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants